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Methods  for  Large-scale  Nonlinear  Optimisation! 


Philip  E.  Gill,  Walter  Murray,  Michael  A.  Saunders  and  Margaret  H.  Wright 

Systems  Optimisation  Laboratory 
Department  of  Operations  Research 
Stanford  Unirersity 
Stanford,  CA  94305 


ABSTRACT 

The  application  of  optimisation  to  electrical  power  technology  often  re¬ 
quires  the  numerical  solution  of  systems  that  are  rery  large  and  possibly  non- 
differentiable.  A  brief  surrey  of  the  state  of  the  art  of  numerical  optimisation 
is  presented  in  which  those  methods  that  are  directly  applicable  to  power  sys¬ 
tem  problems  will  be  highlighted.  The  areas  of  current  research  that  are  most 
likely  to  yield  direct  benefit  to  practical  computation  are  identified.  The  paper 
concludes  with  a  surrey  of  arailable  software. 


t  Presented  at  the  SIAM  conference:  "Electric  Power  Problems:  the  Math¬ 
ematical  Challenge,”  Seattle,  March  1980. 
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$2  METHODS  POR  DEN8E  PROBLEMS  1 

1.  Introduction 

The  application  of  optimisation  to  the  electrical  power  industry  has  produced 
problems  in  which  the  matrices  of  constraint  gradients  and  second  derivatives 
are  large  and  sparse.  There  are  two  approaches  to  solving  large-scale  problems: 
methods  for  small  dense  problems  can  be  adapted  to  solve  larger  problems  by 
using  sparse  matrix  techniques;  or  special-purpose  methods  can  be  used  to  reduce 
the  problem  to  a  related  sequence  of  simpler  problems.  In  this  paper  we  shall 
review  methods  that  follow  the  first  approach.  In  the  first  three  sections  we 
define  a  number  of  “prototype*  methods  that  give  the  best  rate  of  convergence 
when  applied  to  small  problems,  but  are  easily  adapted  to  cater  for  sparsity.  It  is 
demonstrated  how  sparse  matrix  techniques  can  be  used  to  extend  the  prototype 
algorithms  to  solve  several  important  types  of  large-scale  problems  —  particularly 
problems  in  which  a  small  number  of  the  variables  appear  nonlinearly. 

Unfortunately  many  electrical  power  problems  occur  with  a  large  number 
of  nonlinear  constraints  (see  Biggs  and  Laughton,  1976).  This  is  precisely  the 
class  of  problems  which  are  the  most  difficult  to  solve  numerically.  In  Section 
5  we  shall  discuss  current  research  that  shows  promise  of  allowing  such  difficult 
problems  to  be  solved. 

The  mathematical  description  of  the  problem  of  concern  is: 

minimise  F(z) 

*€®n 

subject  to  c,(z)  >0,  i  =  1, 2, . . , ,  m. 

where  F(z)  (the  objective  function)  and  {c,{x))  (the  constraint  functions)  are 
twice  continuously  differentiable. 

In  the  most  difficult  problems,  each  e,{z)  is  a  nonlinear  function.  Algorithms 
will  also  be  described  for  problems  in  which  the  set  {c»(z)}  is  empty  —  the  un¬ 
constrained  optimisation  problem ;  and  the  constraints  are  all  linear  functions  of 
x  —  the  linearly  constrained  problem. 

The  n-dimensional  column  vector  p(z)  denotes  the  gradient  of  F(z),  and 
G{z)  denotes  the  Hessian  matrix  of  F(z).  The  vector  z  refers  to  a  solution  of 
the  optimization  problem. 

It  is  impossible  for  a  short  survey  to  give  an  adequate  description  of  all 
the  recent  developments  in  a  field  of  study  that  has  grown  substantially  in  com¬ 
plexity  and  range  during  the  last  few  years.  It  is  intended,  however,  that  the 
references  cited  in  the  text  will  enable  the  reader  to  become  familiar  with  the 
more  substantial  and  detailed  accounts  of  the  subject. 

2.  Methods  for  dense  problems 

All  the  algorithms  discussed  in  this  paper  generate  a  sequence  of  estimates  {z*} 
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of  the  optimal  point  z.  Many  do  10  by  generating  a  search  direction  p*  and  a 
step  length  a*  such  that 

**+ 1  =  **  +  QkPkl 

those  that  have  the  additional  property  that  F(z*+i)  <  F(x*)  are  known  as 
descent  methods.  A  complete  description  of  the  methods  used  to  compute  an  is 
beyond  the  scope  of  this  paper.  However,  it  mutt  be  emphasised  that  the  robust* 
ness  of  an  optimisation  method  will  depend  upon  the  method  used  to  compute 
Ok  (see  Ortega  and  Rheinboldt,  1070;  Gill  et  a).,  1079). 


2.1  Methods  for  unconstrained  minimisation. 

A  useful  technique  in  the  optimisation  of  differentiable  functions  is  the  utiliza¬ 
tion  of  a  local  quadratic  model  of  the  objective  function.  Methods  based  on  such 
approximations  should  be  expected  to  work  efficiently  for  quadratic  functions. 
One  obvious  candidate  for  a  quadratic  model  of  the  objective  function  is  the 
function  obtained  by  taking  the  first  three  terms  of  the  Taylor-series  expansion 
about  the  current  point,  i.e. 

F(zk  +  p)  «  Fh  +  g Ip  +  jPrGkP, 

where  G*  denotes  the  Hessian  matrix  evaluated  at  the  point  x*.  The  idea  under* 
lying  many  methods  for  constrained  minimisation  is  that  this  approximation  is  a 
reasonable  model  of  the  objective  function,  in  terms  of  finding  a  local  minimum. 
To  make  this  analogy,  it  is  helpful  to  consider  the  quadratic  model  in  terms  of 
p  rather  than  z.  Consider  the  quadratic  function 

Qip)  -  9kP  +  \prGkp, 

for  a  fixed  vector  gk  and  symmetric  matrix  G*.  If  <7*  is  positive  definite,  Q(p) 
has  a  unique  minimum  at  the  stationary  point  p*  that  satisfies  GkPk  -}-  p*  =  0, 
or  equivalently 

GkPk  =  -9k-  (1) 

This  choice  of  search  direction  is  not  suitable  for  general  application  since  any 
possible  indefiniteness  in  Gk  may  result  in  Ph  not  being  a  descent  direction,  i.e. 
there  may  not  exist  an  a  such  that  F(z*  +  op*)  <  F*.  The  class  of  modified 
Newton  methods  is  characterised  by  the  definition  of  a  positive-definite  matrix 
<7*  which  is  equal  to  to  G*  on  those  occasions  where  Gk  is  positive  definite,  but  it 
equal  to  a  related  positive-definite  matrix  otherwise  (see  Gill  and  Murray,  1974a; 
Mor4  and  Sorenson,  1979). 
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If  it  i«  not  possible  to  provide  a  subroutine  to  evaluate  the  second  derivatives, 
an  algorithm  with  almost  identical  performance  to  that  of  Newton's  method  can 
be  obtained  by  finding  the  columns  of  the  Hessian  using  finite  differences  of  the 
gradient  vector.  For  details  see  Gill  and  Murray  (1974b). 

A  fundamental  advantage  of  Newton-type  methods  is  that  under  certain 
mild  restrictions  on  the  objective  function,  they  can  be  shown  to  be  convergent 
to  a  local  minimum  of  F.  However,  this  advantage  is  obtained  at  the  cost  of 
evaluating  either  the  second  derivatives  or  n  +  1  gradients  at  each  iteration. 
Moreover,  it  is  necessary  to  solve  the  set  of  linear  equations  (1)  from  scratch  at 
each  iteration.  The  class  of  quasi-Newton  methods  was  designed  to  avoid  both 
of  these  problems  (see  Dennis  and  Mor4,  1977;  Brodlie,  1978).  Here,  a  positive- 
definite  approximate  Hessian  Bk  is  known  at  the  Jb-th  iteration  and  the  direction 
of  search  is  given  by  the  solution  of  the  "quadratic  program”: 

minimise  \pTBkp  +  g\p. 

On  completion  of  the  k- th  iteration,  a  matrix  Bk+i  is  computed  such  that 
Bk+i  —  Bk  is  a  matrix  of  low  rank  —  usually  of  rank  one  or  two,  and  Bk+i 
satisfies  the  quasi-Newton  condition 

Bk+l(*k+ 1  —  =  Pfc+i  ““  9k- 

This  condition  is  approximately  satisfied  by  the  true  Hessian  in  the  neighbour¬ 
hood  of  the  solution.  Although  there  are  an  infinite  number  of  possible  definitions 
of  1  it  is  generally  accepted  that  the  BFGS  formula 

Bk+ 1  —  Bk  +  -~9k9k  H - j—VkVk» 

9iPk  <*kVlPk 

where  y*  =  gk+i  —  gk)  consistently  gives  good  results.  Since  the  approximate 
Hessian  alters  only  by  a  matrix  of  rank  two  at  each  iteration  it  is  possible  to 
update  an  invertible  form  of  Bk-  This  form  may  be  either  a  triangular  factoriza¬ 
tion  or  an  explicit  inverse.  For  numerical  computation  it  is  preferable  to  recur 
the  triangular  factorisation  of  B*  (see  Gill  and  Murray,  1972,  1978,  for  more 
details). 

All  the  methods  discussed  so  far  require  the  storage  of  an  n  X  n  approximate 
Hessian  matrix.  Conjugate-gradient  methods  require  the  storage  of  only  a  few  n- 
vectors.  The  following  algorithm  was  suggested  by  Fletcher  and  Reeves  (1964). 
During  the  first  iteration,  pk  is  just  the  steepest-descent  direction  —g[xo).  After 
the  computation  of  a*,  the  direction  of  search  for  the  next  iteration  is  found 
from  the  formula 

P*+l  =  9k+l  +  PkPkt 
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where  /?*  =  II 0*+ ill!/ II?* Ila-  H  o*  *•  computed  as  the  minimum  of  F{zk  -f  op*) 
and  the  algorithm  is  used  to  minimise  a  quadratic  function  F{z)  =  cTz+  } zTQz 
with  Q  a  symmetric  positive-definite  matrix  and  c  an  n- vector,  the  directions  ob¬ 
tained  from  this  algorithm  are  identical  to  those  of  both  the  Hestenes  and  Stiefel 
conjugate-gradient  method  for  solving  the  linear  equations  Qz  =  — c  (Hestenes 
and  Stiefel,  1952)  and  the  BFGS  quasi-Newton  method.  Unfortunately,  if  the 
approximate  Hessian  matrix  can  be  stored,  the  Fletcher-Reeves  algorithm  is 
generally  inferior  to  the  BFGS  quasi-Newton  algorithm  in  terms  of  the  number 
of  times  that  F  is  evaluated.  However,  recent  research  on  conjugate-gradient- 
type  methods  has  considerably  increased  their  robustness  and  efficiency.  This 
work  has  mainly  consisted  of  deriving  methods  in  which  pk+ 1  is  computed  from 
the  gradients  of  the  previous  r  iterations  by  solving  equations  of  the  form 

( 1  +  ]C  (wf  “  *<MDV*+i  =  -0M- i> 

'  i—  Jk-r+i  ' 

where  D*+i  is  a  diagonal  matrix.  It  can  be  shown  that  the  Fletcher-Reeves 
algorithm  is  the  member  of  this  class  of  methods  obtained  by  using  an  exact 
linear  search  with  r  =  1  and  Dk+i  equal  to  the  identity  matrix.  For  more 
details  see  Shanno  (1978);  Nazareth  and  Nocedal  (1979);  Gill  and  Murray  (1979); 
and  Nocedal  (1979). 

Nonlinear  conjugate-gradient  methods  behave  in  a  similar  way  to  conjugate- 
gradient  methods  for  the  solution  of  systems  of  linear  equations.  These  methods 
work  best  on  problems  whose  Hessian  matrices  have  sets  of  clustered  eigenvalues. 
On  more  general  problems,  however,  even  the  best  method  may  require  a  prohibi¬ 
tively  large  number  of  iterations.  For  general  problems  (i.e.  problems  without 
clustered  eigenvalues)  conjugate-gradient-type  methods  require  significantly  more 
function  evaluations  than  quasi-Newton  methods. 


2.2  Methods  for  linearly  constrained  minimization. 

It  is  common  for  optimisation  problems  to  include  linear  constraints  of  the  form: 
(1)  simple  bounds  on  the  variables, 


^  z%  ^  Ui  i  —  1, . . . ,  b; 
(2)  lineu  constraints  (equality  or  inequality), 


)-l 

01*1  +  oa*a  +  •  •  *  +  On*n <  =  >  P- 

[<) 
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The  most  successful  algorithms  for  linearly  constrained  minimisation  are 
members  of  the  class  of  "active  set”  methods.  At  each  iteration,  a  particular 
subset  of  the  constraints  are  treated  as  "active*.  The  search  direction  is  then 
computed  so  as  to  stay  "on”  these  constraints,  adding  new  ones  to  the  active  set 
as  they  are  encountered.  The  search  direction  on  the  set  of  active  constraints 
is  found  by  solving  a  linearly  constrained  problem  with  the  active  constraints 
treated  as  equalities.  Suppose  that  there  are  t  linearly  independent  active  con¬ 
straints  and  that  their  coefficients  comprise  the  rows  of  the  matrix  A.  We  then 
need  to  solve  the  equality-constraint  problem 

minimise  F{z) 

«€**  (2) 
subject  to  At  —  l. 


Given  any  feasible  point  z*  we  require  a  vector  p*  such  that  zk  +  p*  is  feasible 
and  close  to  a  minimum  of  (2).  This  gives  the  following  quadratic  program  for 


Pk' 


minimise 

)»€»** 

subject  to 


\pTBk?  +  9kP 

Ap  —  0. 


(3) 


Here,  as  in  the  unconstrained  case,  Bk  denotes  some  approximation  to  the 
Hessian  matrix  of  F.  The  t  equality  constraints  have  the  effect  of  reducing  the 
dimensionality  of  the  optimisation  problem  to  n  —  t,  as  follows.  Any  vector 
satisfying  the  constraints  of  (3)  can  be  written  in  the  form  p  =  Zpz,  where 
Z  is  an  n  x  (n  —  t)  matrix  whose  columns  form  a  basis  for  the  set  of  vectors 
orthogonal  to  {fi,}.  Written  in  terms  of  pz,  the  quadratic  program  (3)  becomes 
an  unconstrained  problem  with  optimum  given  by  the  solution  of  the  linear 
equations 

ZTBkZpz  =  —  ZTgk. 


The  solution  pk  of  (3)  is  recovered  as  p*  =  Zpz-  When  second  derivatives  are 
known,  G{zk)  can  often  be  used  as  Bk.  In  this  case  the  matrix  ZTGkZ  is  defined 
as  the  projected  Hessian  matrix  and  is  denoted  by  Gz-  Similarly,  the  (n  —  t)- 
vector  ZTg  is  defined  as  the  projected  gradient  and  is  denoted  by  gz  (see  Gill 
and  Murray,  1974c). 

The  terms  "gradient  projection”  and  "reduced  gradient”  are  often  used 
to  refer  to  this  problem  transformation,  which  effectively  "projects”  F  into  a 
"reduced”  subspace.  Alternatively,  one  can  consider  that  these  methods  generate 
iterates  that  remain  "on”  the  constraints  while  moving  to  decrease  the  objective 
function. 

Any  vector  u  in  ft*  can  be  written  in  the  form  u  =  ATUx  +  Zu2.  At  a  solu¬ 
tion  of  the  equality-constraint  problem  (2)  the  projected  gradient  will  be  sero, 
which  implies  that  the  gradient  vector  has  no  component  in  the  space  spanned 
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by  Z.  Thus  there  must  exist  t  scalars  Xy  such  that 


g(z)  =  ArX. 

The  f- vector  X  is  known  as  the  vector  of  Lagrange  multipliers.  Note  that  X  is 
the  solution  of  a  set  of  over-determined  linear  equations. 

When  a  minimum  on  a  particular  set  of  active  constraints  has  been  found, 
a  decision  must  be  taken  as  to  which  constraint  (if  any)  should  be  deleted  from 
the  active  set.  It  can  be  shown  that  a  Lagrange  multiplier  gives  a  first-order 
estimate  to  the  change  in  F  for  a  unit  perturbation  in  the  constraint.  A  posi¬ 
tive  Lagrange  multiplier  Xy  indicates  that,  to  first  order,  the  objective  function 
cannot  be  reduced  by  moving  off  the  j- th  constraint.  Conversely,  if  the  search 
direction  is  computed  so  that  a  constraint  with  a  negative  multiplier  will  be 
inactive  at  the  next  point,  there  must  exist  a  scalar  a  such  that  F(x*  -f-  apk)  < 

PM- 

It  is  usually  more  efficient  to  avoid  a  complete  minimisation  on  a  subspace 
and  delete  a  constraint  earlier  when  it  seems  appropriate.  A  way  of  investigating 
whether  a  constraint  should  be  deleted  at  points  other  than  at  constrained  sta¬ 
tionary  points  is  to  compute  estimates  of  the  Lagrange  multipliers.  For  example, 
the  estimate  \L  defined  by  the  least-squares  problem 

min  ||ArX  —  gkh 

is  frequently  used.  The  reader  is  referred  to  Gill  and  Murray  (1979)  for  a  detailed 
discussion  of  Lagrange-multiplier  estimates. 

The  apparent  differences  among  algorithms  of  the  reduced-gradient  type 
arise  from  the  various  ways  of  representing  Z,  in  order  to  generate  search  direc¬ 
tions  that  remain  in  the  proper  subspace.  For  small  dense  problems,  the  best 
method  for  computing  Z  is  based  on  the  QR  factorization  of  Ar  (see  Stewart, 
1973).  Let  Q  be  an  n  X  n  orthogonal  matrix  such  that: 

where  ft  is  a  t  x  t  nonsingular  upper-triangular  matrix.  The  last  n—t  rows  of  Q 
can  be  taken  as  the  columns  of  the  matrix  Z,  since  they  are  linearly  independent 
and  orthogonal  to  the  rows  of  A. 

A  basic  feature  of  linearly  constrained  minimization  is  that  A  and  Z  can  be 
updated  since  A  changes  by  only  a  single  row  at  each  iteration.  Consequently, 
an  important  consideration  in  the  choice  of  Z  should  be  the  accuracy  of  the 
computed  quantities  after  a  long  sequence  of  updates.  The  orthogonal  factoriza¬ 
tion  is  by  far  the  most  accurate  method  for  computing  Z.  It  is  the  only  choice 
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of  Z  that  gives  a  constrained  problem  whose  degree  of  difficulty  is  never  worse 
than  that  of  the  original  problem.  For  other  methods  for  computing  Z  and  a 
discussion  of  the  relative  merits  of  alternative  techniques,  see  Gill  and  Murray 
(1974b). 

If  the  linear  constraints  comprise  only  simple  bounds,  it  is  not  necessary 
to  compute  a  matrix  Z  since  columns  of  the  identity  matrix  naturally  define 
a  suitable  basis  for  the  null  space  of  the  constraints.  In  this  case  an  active 
set  strategy  is  equivalent  to  partitioning  the  variables  into  two  sets:  the  set  of 
fixed  variables  which  are  at  their  upper  or  lower  bounds,  and  the  set  of  tree 
variables  which  are  currently  being  optimized.  An  unconstrained  minimization  is 
performed  with  respect  to  the  free  variables.  This  unconstrained  problem  is  al¬ 
tered  occasionally  if  a  free  variable  violates  a  bound  or  a  fixed  variable  is  allowed 
to  become  free.  Clearly,  algorithms  for  bound-constrained  minimization  are 
conceptually  closer  to  algorithms  for  unconstrained  minimization  than  linearly 
constrained  minimization.  However,  since  the  number  of  free  variables  may  be 
much  smaller  than  n,  the  difficulty  of  the  minimization  may  be  significantly 
reduced.  Intuitively,  one  would  expect  that  it  would  be  possible  to  construct 
a  more  efficient  algorithm  by  providing  more  information  about  the  region  in 
which  the  solution  is  expected  to  lie. 


2.3  Methods  for  nonlinear  constraints. 

The  level  of  difficulty  increases  significantly  when  there  are  nonlinear  constraints. 
One  of  the  added  complications  is  that,  given  a  direction  of  search  p *  and  a  point 
Xk  that  satisfies  a  nonlinear  constraint,  there  may  be  no  step  length  a*  such 
that  z*  -f  QkPk  also  satisfies  the  constraint.  Moreover,  methods  that  attempt  to 
follow  the  constraint  boundary  are  no  longer  suitable.  Suppose  that  c»(z)  is  a 
constraint  that  is  almost  satisfied  exactly  at  z*.  The  Taylor-series  expansion  for 
Ci(xk  +  P )  give* 


ct{xk  -f  p)  =  c,(z*)  -f  a,(z*)rP  +  Ofllpll2), 

where  a,(z)  denotes  the  gradient  of  the  constraint  function  c,(z).  As  z*  ap¬ 
proaches  the  solution,  all  changes  in  z  lie  in  the  null  space  of  constraints  and 
at{zk)Tp  0.  Clearly  a  first-order  expansion  of  the  constraints  does  not  ac¬ 
curately  give  Ci{zk  4-  p)  a  *  a  function  of  p. 

At  a  solution  z,  suppose  that  t  of  the  constraints  are  satisfied  exactly  and 
that  these  constraints  are  denoted  by  {2j(z)}.  If  A(z)  denotes  the  matrix  whose 
t-th  row  is  d,(z)r,  there  exist  t  non-negative  scalars  X,  such  that 


A(z)rX  =  ?(£). 
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(For  the  precise  second-order  Kuhn- Tucker  conditions  for  optimality  see  Fiacco 
and  McCormick,  1964;  Powell,  1974.) 

An  important  tool  for  the  understanding  and  solution  of  constrained  prob¬ 
lems  is  the  Lagrangian  function 

L(i  = 

t~l 


The  point  x  is  a  stationary  point  of  the  Lagrangian  with  optimal  multipliers 
since 

V*L(z,  X)  =  g(x)  —  A(z)rX  ==  0. 

Unfortunately,  x  is  not  usually  a  local  minimum  of  the  Lagrangian  function 
(instead  it  may  be  a  saddle  point),  but  it  can  be  shown  that  z  is  the  minimum 
of  L(x,  X)  when  x  is  restricted  to  lie  in  the  linear  subspace 


A(z  —  x)  =  0. 


(4) 


To  illustrate  this,  consider  the  problem 

minimise  F(x)  =  Xjxjj 
subject  to  x\  +  x\  —  2  =  0. 

This  problem  has  the  optimal  Lagrange  multiplier  X  .816497.  Some  lines  of 
equal  function  value  for  the  Lagrangian  function  are  shown  in  Figure  1  together 
with  the  set  of  x  defined  by  (4).  Note  that  the  Lagrangian  function  has  a  saddle 
point  at  x  and  positive  curvature  along  the  linearised  constraint. 

Suppose  that  xk  is  an  estimate  of  z.  The  Taylor-series  expansion  for  c,(x*~f- 
p)  gives 

e.{z*  +p)  =  Ci{xk)  -f  Oi{xk)Tp  +  0(||p||a). 

This  expansion  can  be  used  to  impose  the  requirement  that  xk  -f  p  satisfies  all 
the  constraints  to  first  order,  i.e. 

A(z*)p  >  c[zk). 

Since  the  optimal  Lagrange  multipliers  are  usually  unknown,  the  Lagrangian 
function  must  be  defined  using  estimates,  X.  This  analysis  leads  to  the  following 
linearly  constrained  sub-problem 

minimise  L(z,  X) 

subject  to  A(z*)(z  —  x*)  >  —c{zk). 


(5) 
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Figure  1. 


It  can  be  shown  that  any  term  of  the  form  oTA[zk)v  can  be  added  to  the 
Lagrangian  function  of  (5)  without  altering  the  solution  of  the  sub- problem.  By 
adding  the  term  X  A[zk)z  to  the  Lagrangian  function  we  ensure  that,  as  z* 
approaches  z,  the  optimal  Lagrange  multipliers  of  the  linearly  constrained  sub- 
'  problems  are  equal  to  those  of  the  nonlinearly  constrained  problem.  This  conver¬ 
gence  result  also  implies  that  the  Lagrange  multiplier  vector  from  the  previous 
linearly  constrained  sub-problem  may  be  used  for  X.  Note  that  the  set  of  active 
constraints  is  implicitly  defined  by  the  solution  of  the  sub-problem.  For  more 
theoretical  details  of  this  method  see  Robinson  (1972). 

Rather  than  solve  a  complete  linearly  constrained  problem  at  each  iteration 
we  can  form  a  quadratic  approximation  to  the  Lagrangian  function  and  solve 
the  resulting  inequality  quadratic  program: 


minimise 

pe** 

subject  to 


\?TBhp  +  glp 

M*k)p  >  -c{zk), 


where  £*  now  denotes  an  approximation  to  the  Hessian  of  the  Lagrangian  func¬ 
tion. 

The  conditions  on  p*  are  derived  by  assuming  that  ||p*||  is  small.  To  allow 
for  large  values  of  {|p*||  it  might  seem  appropriate  to  find  a  scalar  a  such  that 
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\\e(xk  -f  op*)||  <  ||c(z*)||.  However,  such  a  step  may  result  in  the  objective  func¬ 
tion  being  increased  from  its  previous  value.  The  solution  to  this  dilemma  is  to 
define  a  “merit*  function  which  allows  a  to  be  adjusted  so  that  the  conflicting 
aims  of  achieving  feasibility  and  reducing  the  objective  function  are  balanced. 
Among  the  possible  choices  for  the  merit  function  are  the  following 

P(z,p)  =  F(z)- H  £  £  c»'(*)a* 

* 1 

P(x,p)-F{x)  +  p^2\ei{x)\. 

Mil 

These  functions  add  a  penalty  to  the  objective  function  that  depends  on  the 
constraint  violation.  Note  that  P  is  not  differentiable. 

In  our  description  we  have  ignored  some  of  the  difficulties  associated  with 
solving  inequality  quadratic  programs  or  linearly  constrained  sub-problems.  For 
example,  the  sub-problem  may  have  no  feasible  point  or  an  unbounded  solution. 
Moreover,  the  quadratic  program  described  here  is  only  one  of  many  alternatives 
that  can  be  posed.  An  alternative  strategy  is  to  define  an  explicit  set  of  active 
constraints  and  solve  an  equality-constraint  quadratic  program.  This  strategy 
has  some  advantages  over  the  scheme  described  here.  However,  we  have  been 
concerned  here  with  a  unified  description  that  will  serve  to  introduce  methods 
for  large-scale  problems.  For  a  selection  of  QP-based  techniques  the  reader  is 
referred  to  Wilson  (1963),  Murray  (1969),  Biggs  (1972),  Garcia  and  Mangasarian 
(1976),  Han  (1976,  1977),  Powell  (1977)  and  Murray  and  Wright  (1978).  A  dis¬ 
cussion  of  the  relative  merits  of  the  alternative  approaches  is  given  by  Murray 
and  Wright  (1980). 


2.4  Methods  for  non-differentiable  functions. 

Although  non-differentiable  problems  are  in  general  more  difficult  to  solve,  a 
distinction  must  be  made  between  a  problem  with  random  discontinuities  in 
functions  or  derivatives,  and  one  in  which  a  great  deal  of  information  is  known 
about  the  nature  of  any  discontinuities.  In  the  former  case,  the  only  algorithms 
available  are  of  the  polytope  type  (see  Nelder  and  Mead,  1962).  In  the  latter 
case,  algorithms  can  take  advantage  of  the  special  structure. 

In  some  well  known  instances,  the  problem  functions  themselves  are  not 
smooth,  but  rather  are  composites  of  smooth  functions.  For  example,  the  fol¬ 
lowing  non-differentiable  function  occurs  frequently,  and  is  constructed  in  a  par¬ 
ticular  way  from  the  set  of  smooth  functions  {/,): 

minimise  max{/i(x),  /3(x), . . . ,  /*»(*)}. 
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There  has  been  much  research  concerning  effective  methods  for  these  and  related 
problems,  and  it  is  therefore  advisable  to  use  a  specialised  algorithm  (see  Wolfe, 
1974;  Murray  and  Overton,  1979).  However,  if  such  an  algorithm  is  not  available, 
this  composite  non-differentiable  problem  can  be  transformed  into  a  smooth,  but 
more  complex,  problem  by  introducing  a  new  variable  xn+i»  which  is  an  upper 
bound  on  all  the  functions  {/,-(z)}.  Then  the  problem  is  given  by 

minimise  Zn+i 

subject  to  ft{x)  <  *«+ 1,  s  =  1, . . . ,  m. 

Note  that  the  original  unconstrained  problem  has  been  transformed  into  a 
nonlinearly  constrained  problem.  In  fact,  all  transformations  of  non-differentiable 
composite  functions  lead  to  a  similar  increase  in  complexity. 

3.  Methods  for  large  sparse  problems 

Wherever  possible,  methods  for  large-scale  optimization  will  be  identical  to  those 
for  the  dense  case,  except  that  sparse  matrix  techniques  will  be  utilized  to  mini¬ 
mize  the  storage  and  number  of  operations  required.  However,  we  shall  often  find 
that  a  method  that  is  best  for  dense  problems  must  be  substantially  altered  if  it  is 
to  be  extended  to  the  sparse  case.  In  many  cases  this  alteration  compromises  the 
theoretical  features  of  the  algorithm  to  the  extent  that  it  is  no  longer  practical. 

In  many  ways,  methods  for  large-scale  nonlinear  optimization  have  similar 
properties  to  methods  for  linear  equations.  It  is  now  generally  accepted  that  the 
most  stable  methods  for  dense  linear  equations  are  not  the  most  efficient  for  the 
sparse  case.  Unfortunately,  this  implies  that  any  method  for  sparse  problems 
may  sometimes  involve  numerical  processes  that  are  not  always  as  numerically 
stable  as  we  would  like.  The  same  situation  applies  to  algorithms  for  general 
nonlinearly  constrained  optimization. 

3.1  Methods  for  sparse  unconstrained  minimisation. 

For  unconstrained  minimisation  it  is  no  longer  clear  that  quasi-Newton  methods 
are  the  most  effective  techniques.  Toint  (1979)  derived  a  sparse  quasi-Newton 
update  of  the  form 

Bk+i  =  Bk  +  Uk, 

where  Uk  is  a  matrix  of  rank  n  that  has  the  same  sparsity  pattern  as  Bk.  In 
order  to  compute  the  correction  Uk  it  is  necessary  to  solve  a  system  of  equations 
with  a  coefficient  matrix  having  the  same  structure  as  Bk-  Unlike  the  dense 
case,  it  is  not  possible  to  guarantee  that  Bk  will  be  positive  definite,  and  conse¬ 
quently  quasi-Newton  methods  are  identical  to  modified-Newton  methods  in  the 
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requirement  for  a  related  positive-definite  matrix  to  define  p* .  Note  that  the 
three  advantages  of  dense  quasi-Newton  methods  over  finite-difference  methods 
have  been  lost  —  the  maintenance  of  positive  definiteness,  the  ability  to  recur 
in  invertible  form  and  the  lower  overhead  per  iteration.  This  reversal  in  perfor¬ 
mance  is  made  more  striking  if  finite-differences  of  the  gradient  vector  are  taken 
along  certain  linear  combinations  of  the  co-ordinate  directions.  This  enables 
the  approximate  Hessian  to  be  found  in  significantly  fewer  than  n  -f- 1  gradient 
evaluations  (see  Curtis  et  a/.,  1971;  Gill  and  Murray,  1973;  Powell  and  Toint, 
1979) 

If  there  are  a  large  number  of  variables  and  the  Hessian  matrix  is  not  sparse 
or  structured,  conjugate-gradient  methods,  instead  of  being  the  lowest  ranked 
algorithms,  become  the  only  algorithms  that  can  be  applied  with  any  chance 
of  success.  Problems  whose  Hessian  matrices  at  the  solution  contain  sets  of 
clustered  eigenvalues  may  be  minimized  in  significantly  fewer  than  n  iterations. 
Problems  without  this  property  may  require  anything  from  between  n  and  5n 
iterations,  with  approximately  2n  iterations  or  fewer  being  a  common  figure  for 
moderately  difficult  problems.  Clearly,  for  a  very  large  unconstrained  problem 
without  structure,  of  the  order  of  several  thousand  variables,  there  is  currently 
no  algorithm  that  can  be  applied  and  be  expected  to  work  efficiently. 


3.2  Methods  for  sparse  linear  constraints. 

In  this  section  we  shall  discuss  methods  for  large-scale  linearly  constrained  min¬ 
imization  that  exploit  the  sparsity  in  A  or  G. 

When  A  is  large  and  sparse  it  is  not  practical  to  update  the  orthogonal 
factorization  of  the  matrix  of  active  constraints  as  constraints  enter  and  leave 
the  basis.  Instead,  the  problem  is  formulated  in  a  slightly  different  way  so  that 
the  extensive  results  on  updating  the  sparse  factorisations  occurring  in  linear 
programming  may  be  utilized. 

Consider  the  problem 

minimize  Fix) 

subject  to  Ax  =  6,  /,  <  z,  <  u,-. 

The  matrix  A  is  now  m  X  n  with  m  <  n.  If  the  original  problem  does  not 
conform  to  this  standard  form  it  can  be  made  to  do  so  by  the  introduction  of 
slack  variables.  In  this  event  the  resulting  matrix  A  will  have  a  special  struc¬ 
ture,  involving  columns  that  are  columns  of  the  identity  matrix.  This  does  not 
seriously  increase  the  storage  requirements  since  the  matrix  A  will  be  stored  in 
packed  form. 

A  typical  set  of  t  active  constraints  must  comprise  the  m  rows  of  Ax  =  b 
and  t  —  m  simple  bounds.  It  is  not  possible  in  the  large  sparse  case  to  compute 
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the  columns  of  Z  explicitly.  If  we  assume  without  loss  of  generality  that  the  last 
n  —  t  variables  are  on  their  bounds,  the  active  constraint  matrix  A  for  a  typical 
iteration  can  be  partitioned  so  that 

.  (B  S  N\ 

A  V  // 

where  B  is  an  m  X  m  “basis"  matrix.  This  partition  effectively  divides  the  vari¬ 
ables  into  three  classes  —  dependent,  independent  and  temporarily  fixed.  The 
following  matrix  Z  is  orthogonal  to  the  rows  of  A: 


l  -B-'S\ 


In  this  case  the  columns  of  Z  are  not  computed  explicitly,  since  the  quantities 
actually  needed  are  products  of  the  form  Zp  and  ZTg,  and  these  can  be  obtained 
by  solving  a  system  of  equations  involving  B  or  BT. 

If  the  matrix  ZTGZ  is  very  large  at  any  trial  solution,  a  conjugate-gradient 
method  must  be  used  to  compute  pz,  since  ZTGZ  will,  in  general,  be  a  dense 
matrix  -  regardless  of  any  possible  sparsity  of  G  and  Z.  Given  the  current 
inefficiency  of  conjugate-gradient  methods  on  problems  without  clustered  eigen¬ 
values,  this  seriously  limits  our  ability  to  solve  general  large-scale  optimisation 
problems. 

Fortunately,  there  is  an  important  class  of  problems  for  which  it  is  known  a 
priori  that  the  projected  Hessian  matrix  can  never  be  greater  than  a  manageable 
size.  These  problems  have  a  small  number  of  variables  that  appear  nonlinearly 
in  the  objective  function,  i.e.  F(z)  is  of  the  form 

F(z)  =  f{x)  -j-  eTt, 

where  x  =  (*i,*a»---»*v)r»  *  =  and  /(z)  denotes  any  differ¬ 

entiable  nonlinear  function.  In  this  case  the  dimension  of  the  projected  Hessian 
matrix  at  the  solution  is  bounded  by  q.  1 S  q  is  small  enough,  a  quasi-Newton 
approximation  to  Gz  can  be  updated  in  the  usual  way.  For  the  complete  details 
of  this  algorithm  see  Murtagh  and  Saunders  (1978). 

Any  linearly  constrained  algorithm  based  on  an  active  set  strategy  requires 
an  initial  feasible  point  to.  This  point  is  best  found  by  linear  programming 
(the  typical  “Phase  1 "  of  the  simplex  method).  Thus  when  minimising  linear 
functions,  an  algorithm  for  large-scale  linearly  constrained  optimisation  should 
be  competitive  with  large-scale  linear  programming  codes. 
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3.3  Methods  for  sparse  nonlinear  constraints. 

The  results  of  the  previous  section  lead  us  to  write  large-scale  nonlinearly  con¬ 
strained  problems  in  the  form 

minimise  /(2)  +  tTl 
subject  to  c(2)  +  A\l  —  bi 
Aj2  -f"  A$t  —  63 
if  <  *»•  <  U,\ 

The  linearly  constrained  sub-problem  analogous  to  (5)  for  this  problem  is  given 
by 


minimise  /(*)  +  cri  —  Xr(c(2)  —  c(2k)  —  A(2k)(2  —  2*)) 

subject  to  c(2k)  -+-  A(2*)(2  —  2*)  -f  Aj*  =  b% 

A^2  A^Z  =  bi 

if  ^  ^  Ui, 

where  A  denotes  the  matrix  of  gradients  of  c(2).  The  essential  feature  of  this 
problem  is  that  the  dimension  of  the  projected  Hessian  matrix  of  the  Lagrangian 
function  is  bounded  by  q.  This  implies  that  each  linearly  constrained  sub¬ 
problem  can  be  solved  using  the  techniques  mentioned  in  the  last  section. 

Murtagh  and  Saunders  (1980)  have  described  a  technique  based  on  solving 
a  similar  set  of  linearly  constrained  problems.  They  add  a  penalty  term  of  the 
form 


£(c(2)  —  e(2jt)  —  i(2*)(2  -  2jk))r(c(2)  -  c(2k)  —  A(  2*)(2  —  2*)) 

to  the  Lagrangian  function  so  that  the  sub-problem  is  more  likely  to  have  a 
bounded  solution. 

As  in  the  linear  constraint  case,  we  must  rely  on  conjugate-gradient  methods 
to  solve  the  sub-problem  if  the  projected  Hessian  matrix  cannot  be  stored  in  the 
machine. 


4.  Numerical  software 

As  the  theoretical  basis  of  algorithms  becomes  more  complex  and  the  scope 
of  optimisation  widens,  it  becomes  more  difficult  for  those  actively  engaged  in 
practical  problem  solving  to  write  their  own  software.  Software  libraries  and 
software  distribution  centers  partially  solve  this  problem  by  making  a  variety  of 
techniques  immediately  available  to  the  non-specialist. 


54 


NUMERICAL  80PTWARE 


15 


Those  interested  in  obtaining  software  should  be  aware  of  the  different 
ways  in  which  mathematical  software  is  disseminated.  Often  a  new  optimisa¬ 
tion  method  is  published  in  a  research  journal  and  the  method  is  accompanied 
by  a  short  program  implementing  the  new  techniques  discussed  in  the  paper. 
Alternatively,  a  program  will  be  obtainable  from  the  author  on  application. 
Usually  this  program  is  written  simply  to  test  out  the  new  theoretical  develop¬ 
ments  and  will  only  work  on  a  few  selected  examples.  In  general,  little  attention 
is  given  to  the  standard  of  the  coding  or  the  documentation.  A  much  better  way 
of  obtaining  mathematical  software  is  from  a  software  library.  This  term  does 
not  mean  simply  a  collection  of  programs  from  varied  sources,  written  in  isola¬ 
tion,  that  are  grouped  together  in  one  place  with  some  uniform  documentation. 
Rather,  a  program  library  is  a  set  of  routines  that  are  conceived  and  written 
within  a  unified  framework,  to  be  available  to  a  general  community  of  users. 

At  this  time  there  are  only  three  organisations  that  distribute  library  quality 
software  for  optimization: 

The  Numerical  Algorithms  Group  (NAG)  Library,  Banbury  Road,  Oxford,  Eng¬ 
land. 

The  NPL  Numerical  Optimisation  Software  Library,  The  Division  of  Numerical 
Analysis  and  Computer  Science,  National  Physical  Laboratory,  Teddington, 
Middlesex,  England. 

The  MINPACK  project,  Applied  Mathematics  Division,  Argonne  National  Lab¬ 
oratory,  Illinois,  USA. 

Before  acquiring  a  piece  of  software  from  a  source  other  than  a  genuine 
software  library,  a  prospective  user  should  obtain  answers  to  the  following  ques¬ 
tions: 

(i)  How  quickly  are  errors  in  the  code  corrected f  If  the  routine  is  the  result  of 
some  piece  of  research,  the  author  may  no  longer  be  employed  by  the  distributing 
organization  or  may  no  longer  be  concerned  with  its  maintenance. 

(ii)  Is  the  routine  written  in  a  high-level  language  that  is  available  at  the 
prospective  users  computer  installation?  Even  the  most  commonly  used  lan¬ 
guages  have  dialects  which  are  only  available  at  certain  installations. 

(iii)  What  changes  will  need  to  be  made  to  the  routine  in  order  to  run  the 
software  at  the  installation ?  The  most  easily  transported  routines  will  be  written 
in  a  portable  high-level  language  such  as  ANSI-Fortran.  It  is  inevitable  that 
some  changes  will  need  to  be  made  since  certain  constants,  such  as  the  relative 
machine  precision,  must  be  tailored  to  the  host  machine  (for  this  reason,  users 
should  be  wary  of  routines  that  require  no  alteration!). 

(iv)  Is  there  adequate  documentation?  We  have  shown  that  there  are  many 
different  problem  categories  in  optimisation.  One  feature  of  the  documentation 
should  be  an  adequate  description  of  the  problem  category  for  which  it  is  in- 
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tended.  Also  the  documentation  should  include  the  results  of  the  application  of 
the  routine  to  a  simple,  but  non-trivial,  example  problem. 


For  other  sources  of  optimisation  software  see  Gill  and  Murray  (1974b,  pp 
242-243)  and  Nazareth  (1978). 

There  are  far  fewer  codes  available  for  large-scale  nonlinear  optimization 
than  codes  for  small  problems.  Several  factors  contribute  to  this  situation. 
Historically,  computational  research  on  large-scale  optimization  has  occurred  in 
the  commercial  field,  with  numerical  work  on  nonlinear  problems  being  more  the 
province  of  “academic*  research.  This  has  occurred  partly  because  the  academic 
community  tends  to  measure  the  quality  of  research  by  the  quantity  of  theoreti¬ 
cal  papers  that  are  published  in  scientific  journals.  This  has  tended  to  discourage 
the  considerable  investment  of  time  needed  to  complete  a  large-scale  optimiza¬ 
tion  code.  (The  emphasis  on  short  theoretical  results  also  goes  some  way  to 
explain  the  large  number  of  published  papers  on  unconstrained  optimization  and 
nonlinear  least-squares  problems.) 

As  a  result  of  this  imbalance,  methods  for  large-scale  nonlinear  optimization 
have  tended  to  be  developed  as  “add  on"  features  of  linear  programming  codes 
when  a  better  approach  may  be  to  consider  linear  programming  as  a  special  case 
of  nonlinear  optimization. 

The  shortage  of  codes  for  large-scale  nonlinear  optimization  is  further  ag¬ 
gravated  by  difficulties  in  disseminating  these  codes  to  the  user  community. 
Methods  for  large  problems  must  include  sophisticated  input  and  output  routines 
in  order  to  handle  the  large  quantities  of  data  associated  with  the  linear  con¬ 
straints.  This  requirement  limits  both  the  degree  of  portability  that  can  be 
achieved  and  the  size  of  the  host  machine. 

In  spite  of  the  difficulties,  some  portable  (or  near  portable)  codes  have  been 
developed  for  large-scale  constrained  optimization.  For  linear  programs,  MINOS 
(Murtagh  and  Saunders,  1978)  and  XMP  (Marsten,  1978)  are  appropriate.  The 
system  MINOS  is  also  designed  to  handle  a  nonlinear  objective  function.  It  nor¬ 
mally  uses  a  quasi-Newton  approximation  to  the  projected  Hessian  ZTGZ,  in 
the  form  RTR,  where  R  is  upper  triangular.  For  problems  where  R  would  be 
extremely  large,  the  software  described  by  Marsten  and  Shanno  (1979)  may  be 
preferable.  However,  at  this  time  we  cannot  expect  a ny  general-purpose  code  to 
be  efficient  on  problems  in  this  class. 

For  problems  with  nonlinear  constraints,  the  algorithm  discussed  in  Section 
3.3  has  been  implemented  in  the  nonlinear  programming  system  MINOS/AUG¬ 
MENTED  (Murtagh  and  Saunders,  1980a,  b).  The  matrix  of  constraint  gradients 
may  be  sparse,  and  there  may  be  a  large  set  of  purely  linear  constraints.  Both 
objective  and  constraint  gradients  must  be  computable.  (This  is  usually  no 
restriction  with  electrical  power  problems.) 
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It  is  a  feature  of  optimal  electrical  power  flow  problems  that  the  number  of 
constraints  active  at  the  solution  is  unlikely  to  be  large.  This  is  because  many 
constraints  are  present  to  prevent  solutions  that  have  no  physical  significance.  A 
model  would  be  considered  to  produce  a  "bad  design”  if  a  significant  number  of 
these  constraints  were  satisfied  at  the  solution.  This  feature  presents  a  serious 
impediment  to  the  routine  solution  of  power  flow  problems  by  the  methods  dis¬ 
cussed  in  Sections  3.2  and  3.3  since  it  is  unlikely  that  the  (n  —  t)  X  (n  —  t)  set 
of  equations 

ZTGZpz  =  -ZTU  (6) 

can  be  solved  in  core.  However,  if  the  Hessian  matrix  and  constraint  gradients 
are  sparse,  matrix-vector  products  of  the  form  ZTGZv  can  be  computed  rela¬ 
tively  cheaply  by  forming,  in  turn,  v%  =  Zv,  t/a  =  Gv i  and  v3  =  ZT v2-  This 
property  can  be  utilized  fully  if  pz  is  found  using  the  linear  conjugate-gradient 
method  (see  Hestenes  and  Stiefel,  1952).  The  linear  conjugate-gradient  algorithm 
is  usually  derived  as  a  direct  method,  in  the  sense  that,  theoretically,  the  exact 
solution  is  found  after  n  —  t  iterations.  However,  in  practice  the  algorithm  be¬ 
haves  more  like  an  iterative  method  since  it  has  the  potential  of  converging  in 
fewer  than,  or  more  than,  n  —  t  iterations. 

Recently,  Dembo  (1979)  has  suggested  a  "truncated  Newton  method”,  in 
which  the  linear  conjugate-gradient  method  is  terminated  even  though  a  solution 
to  (6)  may  not  have  been  determined.  The  last  iterate  of  the  linear  conjugate- 
gradient  algorithm  is  then  used  as  the  direction  of  search.  If  a  single  linear  itera¬ 
tion  is  used,  it  can  be  shown  that  pz  will  be  the  steepest-descent  direction  in  the 
subspace  spanned  by  Z.  If  a  full  n—t  iterations  are  used,  pz  will  be  the  Newton 
direction  defined  by  the  solution  of  (6).  Thus  the  truncated  Newton  algorithm 
computes  a  vector  that  interpolates  between  the  steepest-descent  direction  and 
Newton  direction. 

The  search  direction  defined  by  (6)  is  satisfactory  only  if  ZTGZ  is  positive 
definite.  An  indefinite  matrix  ZTGZ  allows  the  possibility  that  pz  is  not  a 
descent  direction  and  this  may  result  in  convergence  to  a  non-optimal  point. 

An  important  feature  of  the  class  of  modified  Newton  algorithms  described 
in  Section  2  is  the  ability  to  detect  that  ZTGZ  is  not  sufficiently  positive  definite 
and  to  compute  a  satisfactory  descent  direction  regardless.  A  straightforward  ap¬ 
plication  of  a  linear  conjugate-gradient  algorithm  would  not  have  this  property. 
Moreover,  the  linear  conjugate-gradient  algorithm  is  numerically  unstable  when 
applied  to  an  indefinite  system. 

Another  unsatisfactory  feature  of  the  truncated  Newton  method  is  the  use 
of  the  direction  of  steepest  descent.  It  is  well  known  that  the  method  of  steepest 
descent  is  very  inefficient.  Consequently,  unless  a  large  number  of  iterations 
of  the  linear  conjugate-gradient  method  are  used,  the  resulting  search  direction 
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may  have  more  in  common  with  the  steepest-descent  direction  than  the  Newton 
direction. 

Recent  research  has  been  concerned  with  deriving  truncated  Newton  methods 
that  do  not  require  F  to  have  a  uniformly  positive-definite  projected  Hessian.  In 
addition,  they  generate  a  set  of  linear  conjugate  directions  that  are  conjugate 
to  a  vector  other  than  the  steepest-descent  direction.  This  latter  feature  gives 
a  truncated  Newton  method  in  which  pz  interpolates  the  Newton  direction  and 
the  direction  obtained  from  a  nonlinear  conjugate-gradient  algorithm. 
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